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Abstract 

We propose a new GCD algorithm called Accelerated Euclidean Algorithm, or AEA 
for short, which matches the 0(nlog^ nloglogn) time complexity of Schonhage algo- 
rithm for n-bit inputs. This new GCD algorithm is designed for both integers and 
polynomials. We only focus our study to the integer case, the polynomial case is cur- 
rently addressed ([3]). 

1 Introduction 

The algorithm is based on a half-gcd like procedure, but unlike Schonhage's algorithm, it 
is iterative and therefore avoids the penalizing calls of the repetitive recursive procedures. 
The new half-gcd procedure reduces the size the integers at least a half word-memory bits 
per iteration only in single precision. By a dynamic updating process, we obtain the same 
recurrence and the same time performance as in the Schonhage approach. 

Throughout, the following notation is used. is a word memory, i.e.: W = 16, 32 or 64. 
Let u > V > 2 he positive integers where u has a n bits with n > 32. Given a non-negative 
integer x £ N, £{x) represents the number of its significant bits, not counting leading zeros, 
i.e.: i{x) = [log2(x-|-l)]. For the sake of readability integers U and V will be represented as 
a concatenation of / sets of W bits integers (except for the last set which may be shorter), 
with / = \n/W] , i.e.: if C/ = ^'^J 2*^ Ui_i with C/i and F = J2\zl ^-i> then 
we write symbollically \J = Ui •U2 ■ ■ ■ • Ui and V = Vi • V2 ■ ■ ■ • Vi. 
We use specific vectors which represent interval subsets from of U and V, by: 

Ui»Ui+l»--- •Uj \ f Ui 



= [V.. k':; ..... V;' j ^ ^I'l = - j ; for 1 < , < , < L 

Let M(x) be the cost of a multiplication of two x-bit integers. The function M{x) de- 
pends on the algorithm used to carry out the multiplications. The fast Schonhage-Strassen 
algorithm ([6]) performs these multiplications in M(x) = O(nlognloglogn). 



2 AEA: The Accelerated Euclidean Algorithm 

The following algorithm is based on two main ideas: 

• The computations are done in a Most Signifivant digit First (MSF) way. 

• Update by multiplying, with the current matrix, ONLY twice the number of leading 
bits we have already chopped, NOT the others leading bits. 
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Algorithm AEA. 

Input : u > V > 2 with u > 8W; n = i{u); 

Output : a 2 X 2 matrix M and {U', V) such that 

M X {U,V) = {U',V') and i{V') < £{U) - 2Li°g2(n/W')J-ivF. 

Begin 

1. {U,V) ^ {u,vy, s ^ [log^in/W)]; 

2. if ^(F) < \i{U)/2] + 1 return /; 

3. if e{V)> \i{U) /2']+l 

4. For i = 1 to 2*~i 

5. if Ui = or Vi (Regular case) 

6. h^O; 

7. if {i odd) Lo lLE{X[i..i + 1]); updateL(i, h); 

8. else 

9. Rq ^ lLE{X[i..i + 1]); updatei^(i, h); 

10. X <— i/2; ^ ^ + 1; 

11. While {x even) 

12. ^ X Lh-i; updateij(i, h); 

13. X ^ a;/2; <— /i + 1; 

14. Enwhile 

15. L/j ^ X L^_i; updateL(?,/i); 

16. Endelse 

17. else Irregular (i) {Ui / and = 0) ; 

18. EndFor 

19. Return and {U,V); 
End 

The algorithm ILE (borrowed from [7]) runs the extended Euclidean algorithm and stops 
when the remainder has roughly the half size of the inputs. 
Algorithm ILE. 

Input : Uq > ui >0 

Output : a 2 X 2 matrix M and {ui-i,Ui) such that M x {uo,ui) = {ui-i,Ui) 

and £{ui) < ^i{uo). 

Begin 

1. n = i{uo); p = £{ui); 



2. ifp < \n/2] + 1 return M = |^ ^ ^ J ; 

3. ifp > [n/2] + 1 

4. m = p — \n/2~\ — 1; 

5. Apply Extended Euclid Algorithm until \ai\ < 2™ < |ai-|-i 





End. 
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The functions update^ or update/j update not all the bits of the operands, but only the 
next useful small vectors, in order to get the next needed matrix. The irregular case is when 
Ui ^ and Vi = 0, i.e.: one or many components are all equal to zero. Roughly speaking, 
we perform an euclidean division in order to make an efficient reduction and continue our 
process. The aim is to preserve, at most, the general scheme of our process. The basic idea 
is to use full updated vectors, i.e.: vectors updated with all the previous matrices L^. 



3 An Example 

In order to carry out single-precision computations we take W = 20. Recall that the nota- 
tion ( ^ ] = ( j) means (d) = (?)x 2^+ ( j ) , where A, B, a, 6, c and 



^ B J \ b • d J \ B J \ h I \ d 

d are integers (c.f. notation in Section^. 

, , , u\ ( 922375420941 . 

Let = „ , tiien, with our notation, we obtain 



V J \ 707599307587 
^ \ _ ( \ _ ( 879645 \ 20 ( 785421 \ _ ( 879645 • 785421 
V ) ~ \ v ) ~ \ 674819 ) ^ V 299843 ) ~ \ 674819 • 299843 

We have Ui = 879645 and Vi = 674819 then (.{Ui) = l{Vi) = 20, n = p = 20 and 
mi = p — 1— [^] = 9. Thus, in order to compute ILE{Ui, Vi), we must stop the Extended 
Euclid Algorithm at \ai\ < 2^. We obtain the matrix ( \ai\ < 2^ = 512) 

Ni = ILEiUi,Vi) = (!ll -fM then: 



yi220 + V2 ~ \ 601 ^ V -160 ; ^ I 81257 



1204 \ 20 / 892378 
^441 J + V 81257 

Ui \ f 1204 \ ^ f U9 \ { 892378 
V,) = \AAl J ""H ^2 J = I 81257 

Now we can apply ILE to the new integers (less than 32 bits) U = 1263377882 and 
V = 462503273. We have i{U) = 31 and 1{V) = 29. We obtain m = 8 and the second 
matrix (in 32-bits single precision) 

/ _4i 110 

N2 = ILE{U,V) = ' 



and M = N2X Ni 



231 -631 

-41 112 \ / 369 -481 \ _ f -62729 81769 
231 -631 j ^ I -425 554 / ~ I 353414 -460685 



Moreover, at this level, we may consider that Ui and Vi are eliminated (chopped), even if 
U2 has some extra bits, and that U2 and V2 are updated as follows: 
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fU2\ / 1873414 \ / 1 .824838 \ ,,,, , „^ 

(y^ J = (725479 J = (0.725479 J = ^^^^) = 2i' ^(^^) = 2«- 

On the other hand, it is easy to check that the final matrix M = ( ] satisfies 

^ \ at bt J 

U \ _ f -62729 81769 \ / 922375420941 \ _ / 1873414 

V F y V 353414 -460685 )^\ 707599307587 ) ~\ 725479 

Now, if u and v were larger in size, namely: 

u \ _ ( 879645 • 785421 •uz •m ... •Uk 
V ) ~\ 674819 • 299843 • •va ... •Vk 

then we have to continue the half-gcd process. Since W bits have been already chopped, 
we have to update only the double, i.e.: multiply the next 2W leading bits of U and V by 
M, namely perform: 

f/3 • C/4 j M X ( * ] , and disregard all the other bits of U and V. 
V3» V4 J V V 3 • / 

Then do the same process as before with ( ^? ] instead of ( ^} t/^ ) chop 



V2»V3 J \Vi*V2 

the vector ( ^ j, and so on, repeating the process till we reach the middle of the size of U. 



4 An Example 

Let us consider two consecutive Fibonacci numbers {u,v) = (F^^jF^s), i.e.: 

F^g \ _ f 956722026041 \ _ f 956722 ^ ^^g} 026041 \ _ f 956722 • 026041 



F58 J V 591286729879 J \ 591286 / V 729879 J \ 591286 • 729879 
First, we must compute mMAX which gives 2"^^^-^ , the maximum size of the output ma- 
trix Li 

mMAX = £{v) - - 1 = 19. 

Let iUi,Vl) = (956722,591286) then i{Ui) = i{Vi) = 20, n = p = 20 and m = 
p — I — [^] = 9. We run the Extended Euclid algorithm and stops at \ai\ < 2^. We obtain 
the matrix Lq and the remainders (ri,r2): 

/ —144 233 \ 

Lo = ILE{UuVi)= I _3^^ j and (n, rs) = (1670, 1404). 



U2 

By update (1,0) we compute Lq ( y ) , so 
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/ C/i 109 + C/2 A / 1670 \ .r.9,( 166311903 \ ^( 1836 \ , ( 311903 
° V 10^ + ^2 y V 1404 y V -269096830 y V 1134 ) \ 903170 

where FIX occurs in the last equahty, hence the new values of Ui and Vi: 
Ui\ ( 1836 \ . ( U2\ ( 311903 
)^\ 1134 ) ^'''^\V2 ) ^ \ 903170 

We apply ILE to the new integers U = 1836311 and V = 1134903. We have e{U) = 21 and 
i{V) = 21. Repeating the same process as before, we obtain m = 9, the second matrix Rq 
and remainders (ri.r2) 

(2S3 —377 \ 
-377 610 ) '''''^ (r-i,r2) = (2032, 1583). 

Since L\ = Rq x Lq and using update (2,0) we obtain 

lJz ]=bJ \>iif^i ) io»+i^ r ?»n = ( -f'f? ^ io»+ "^^"^^ 



Thus Li 
and Li = i?o X Lq = 



y / ^ V 1134903 y " V 170 y V 1583 J \ -236731 

F32 

F31 

-144 233 \ _ / -121393 196418 
233 -377 / ~ I 196418 -317811 



u 




2178309 


V 


)-{ 


1346269 


( 


233 


-377 \ 




-377 


610 j 



We can apply one more Euclid step because the matrix Q x Li still satifies \ai\ < 
2mMAX ^ = 524288. So after one Euclid step we finally obtain: 

U \ _ f 1346269 \ _ f F31 
V ) ~[ 832040 ) ~ [Fso 



Q X L 

and after Li = Q x Li, we have: 



196418 -317811 
-317811 514229 

Moreover, at this level, we may consider that Ui and Vi are eliminated (chopped), even if 
U2 has some extra bits, and that U2 and V2 are updated as follows: 

C/2 \ _/ 1873414 \ _/ 1.824838 Y f(V)-20 
V2 )-[ 725479 )-[o. 725479 ) ' ^^^'> " ^^^'^ " 

On the other hand, it is easy to check that the final matrix M = ( | satisfies 

\ at h J 

f U \ _ f -62729 81769 \ / 922375420941 \ _ / 1873414 
V y y V 353414 -460685 J ^ \ 707599307587 ) ~ \ 725479 

Now, if u and v were larger in size, namely: 



u \ _ f 879645 • 785421 •us • ... • Uk 
V ) ~ \ 674819 • 299843 • f 3 • V4 ... • t;^ 

then we have to continue the half-gcd process. Since W bits have been already chopped, 
we have to update only the double, i.e.: multiply the next 2W leading bits of U and V by 
M, namely perform: 
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(^f ] <— M X ( I , and disregard all the other bits of U and V. 

V3»V4 J V ^3*^4 y 

Then do the same process as before with ( ^? t/^ ] instead of ( ^} ta^ ] chop 



the vector ^ ^ , and so on, repeating the process till we reach the middle of the size of U. 

Here we stress that this is the main difference between our approach and the other Soren- 
son's like algorithms, where all the bits of U and V are updated by multiplying them with 
the matrix M. In our approach, we only update the double of what we have already chopped. 



5 Remarks 

Unlike the recursive versions of GCD algorithms (([1]),[4]), our aim is not to balance the 
computations on each leaf of the binary tree computations, but to make full of single 
precision everytime, computing therefore the maximum of quotients in single precision. 
This lead to different computations each step in the algorithm AEA and the other recursive 
GCD algorithms. Moreover the fundamental difference between AEA and Schonhage's 
approach is that AEA deals straightforward with the most significant leading bits first 
(MSF computation). Consequently, we can stop the algorithm AEA at any time and 
still obtain the leading bits of the result. Thus AEA is a strong MSF algorithm and can 
be considered for "on line" arithmetics, where all the basic operations can be carried out 
simultaneously as soon as only the needed bits are available. This new algorithm should 
be an alternative to the Schonhage GCD algorithm. On the other hand, the derived GCD 
algorithm deals with many applications where long euclidean divisions are needed. We have 
identified and started to study many applications such as subresultants and Cauchy index 
computation, Pade-approximates or LLL-algorithms. 
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